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models, but so far only at low significance. 

PACS numbers: 98.80.Cq IFT-UAM/CSIC-05-02, LAPTH-1085/05, DFPD/05/A05, astro-ph/0501477 



I. INTRODUCTION 

Following recent developments in observational cos- 
mology, particularly observations by the Wilkinson Mi- 
crowave Anisotropy Probe (WMAP) 0, there exist com- 
pelling reasons to talk about a Standard Cosmological 
Model based on the ACDM paradigm seeded with purely 
adiabatic perturbations. In addition, there have been 
many attempts to analyze more general models featuring 
additional physics, either to constrain such processes or 
in the hope of discovering some trace effects in the data. 
A case of particular interest is the possible addition of an 
admixture of isocurvature perturbations to the adiabatic 
ones 0, which has been studied in t he p ost- WMAP 
era by many authors 0, IE IE IS 

The bulk of investigations so far have as a starting 
point chosen a particular set of parameters to define 
the cosmological model under discussion, and then at- 
tempted to constrain those parameters using observa- 
tions, a process known as parameter fitting. Based on 
such analyses, many parameters are determined to a high 
degree of accuracy. Much less attention has been directed 
at the higher-level inference problem of allowing the data 
to decide the set of parameters to be us ed, known as 
model comparison or model selection [f3 . Il3l |. although 
such techniques have been widely deployed outside of as- 
trophysics. Recently, one of us applied two model selec- 
tion statistics, known as the Akaike and Bayesian Infor- 
mation Criteria, to some simple cosmological models , 
and showed that the simplest model considered was the 
one favoured by the data. These criteria have recently 
been applied to models with isocurvature perturbations 
by Parkinson et al. H, who concluded that the purely 
adiabatic model was favoured. 

Those statistics are not however full implementations 
of Bayesian inference, which appears to be the most ap- 
propriate framework for interpreting cosmological data. 
The correct model selection tool to use in that context is 
the Bayesian evidence [f3 , Il-'il| , which is the probability 



of the model in light of the data (i.e the average likeli- 
hood over the prior distribution). It has been deployed 
in cosmological contexts by several authors [l5j . and the 
ratio of evidences between two models is also known as 
the Bayes Factor ^E]. 1 The Bayesian evidence can be 
combined with prior probabilities for different models if 
desired, but even if the prior probabilities are assumed 
equal, the evidence still automatically encodes a prefer- 
ence for simpler models, implementing Occam's razor in 
a quantitative manner. 

Whenever one aims to decide whether or not a partic- 
ular parameter p should be fixed (for example at p = 0), 
one should use model selection techniques. If one car- 
ries out only a parameter-fitting exercise and then ex- 
amines the likelihood level at which p — is excluded, 
such a comparison fails to account for the model dimen- 
sionality being reduced by one at the point p = 0, and 
hence draws conclusions inconsistent with Bayesian in- 
ference. This typically overestimates the significance at 
which the parameter p is needed. An example is spectral 
index running, which parameter fitting favours at a mod- 
est (albeit unconvincing) confidence level LJ1, but which 
is disfavoured by model selection statistics [lj| . 

In this paper we use the Bayesian evidence to com- 
pare isocurvature and adiabatic models in light of cur- 
rent data. We will closely follow the notation of Beltran 
et al. 0], wn o recently carried out a parameter-fitting 
analysis of isocurvature models, and we use the same 
datasets. Wc follow the notation of that paper and pro- 
vide only a brief summary in this article. 



The difference in Bayesian Information Criterion can be used as 
a crude approximation to In (Bayes Factor), but the existence of 
parameter degeneracies in cosmological data fitting are likely to 
violate the conditions for the validity of the approximation. 
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II. BAYESIAN EVIDENCE 

A. Theoretical basis 

The Bayesian evidence is the average likelihood of a 
model over its prior parameter space, namely 

E = J C(9) pr(0) dff, (1) 

where 9 is the parameter vector defining the model, 
pr(0) the normalized priors on those parameters (typi- 
cally taken to be top- hat distributions over some range) , 
and C(9) is the likelihood. In essence, it asks the ques- 
tion: 'If I consider the possible model parameters I was 
allowing before I knew about this data, on average how 
well did they fit the data?'. Generally speaking, models 
with fewer parameters tend to be more predictive, and 
provided that for some parameter choices they fit the 
data well, then the average likelihood can be expected 
to be higher. On the other hand, a simple model which 
cannot fit the data for any parameter choices will not gen- 
erate a good likelihood. The Bayesian evidence therefore 
sets up the desired tension between model simplicity and 
ability to explain the data. 

Models are ranked in order of their Bayesian evidence, 
usually using its logarithm. The overall normalization 
is irrelevant. As the evidence is the (unnormalizcd) 
probability of the model, if two models are being com- 
pared, the odds of the one with the lower evidence is 
1/(1 + exp(Aln£7)). What constitutes a significant dif- 
ference is to some extent a matter of pers onal taste, 
but a useful guide is given by Jeffreys [l2j who rates 
Alni? < 1 as 'not worth more than a bare mention', 
1 < A\nE < 2.5 as 'substantial', 2.5 < A\nE < 5 
'strong' to 'very strong' and 5 < A\nE as 'decisive', in 
each case the decision being against the model with the 
smaller evidence. Note that a difference A\nE of 2.5 
corresponds to odds of 1 in about 13, and A In E of 5 to 
odds of 1 in 150. 

A significant, but unavoidable, disadvantage of the use 
of the evidence is that it depends on the prior ranges 
chosen for the parameters. For instance, if one doubles 
the range of one parameter by allowing it to vary in a 
region where the likelihood is negligibly small, then the 
evidence will half. Indeed, one can make any model dis- 
favoured simply by extending its prior range indefinitely 
in a direction where there is no hope of fitting the data. 
From a Bayesian point of view this is unsurprising; of 
course your belief in a model should be influenced by 
what you thought of it before the data came along, and 
the Bayesian analysis has the virtue of forcing you to 
make your assumptions explicit. 

However, the prior width is not as crucial as one might 
naively expect. The main reason is that the likelihood is 
typically falling off exponentially away from the best fit, 
while the parameter volume is growing only as a polyno- 
mial function. For example, consider a one-dimensional 



toy-model for which the likelihood is given by 

£(,)=£ exp(-^j , (2) 

and consider two models: model A is x = and model 
B is x with a top-hat prior < x < a. In the 
case [i = 1, a conventional 1-er non-detection, the evi- 
dence would be unable to strongly distinguish between 
the models (|AlnB| < 2.5) for up to a ~ 50. In the case 
fj, = 5, a conventional 5-cr detection, the evidence would 
prefer model B for all a J$ 5 x 10 4 . In other words, for 
reasonable prior ranges the evidence will robustly pick 
up the correct model. Its main advantage is that it is 
a quantitative measure with clear interpretation within 
Bayesian statistics, and can be applied in cases where the 
usual frequcntist arguments do not provide us with def- 
inite answers. Typically, Bayesian analysis contradicts 
the frequentist results whenever the latter accepts a pa- 
rameter in light of a marginally better \ 2 value. If this 
improvement is not significant, the increase of the vol- 
ume of the parameter space will penalize the addition of 
the new parameter and thus decrease the evidence of the 
extended model. 

Generally the evidence is not reparamctrization in- 
variant, in the sense that the choice of a flat prior 
in one parametrization will probably not correspond 
to a flat prior under another parametrization. The 
choice of parametrization is a matter of personal pref- 
erence, though obviously truly robust model selection 
results should be preserved under reasonable changes 
in parametrization. In the case of isocurvature pertur- 
bations there are different, equally plausible, choices of 
parametrization, in particular geared to dealing with the 
problem of the cross-correlation angle becoming uncon- 
strained as the isocurvature mode amplitude becomes 
small 0, [n| . For illustration we will compare the results 
obtained under two different parametrization choices. 

B. Numerical implementation 

The evidence for a given model can be computed by a 
Markov Chain Monte Carlo method. However it cannot 
be directly calculated from chains used in parameter es- 
timation (for instance from the program CosmoMC 
because those chains are sampled from the posterior dis- 
tribution, which is peaked around the maximum likeli- 
hood, and do not carry the necessary information on the 
likelihood far from the maximum. Equally, one cannot 
simply sample from the prior distribution, because the 
dominant contribution from the high-likelihood regions 
will not be properly sampled. Consequently, a hybrid 
technique is required, a useful method being thermody- 
namic integration [lSL fl9| . 

Thermodynamic integration alters the sampling of a 
Markov chain by introducing a parameter A, thought of 
as an inverse temperature, with the acceptance rate gov- 
erned by the likelihood raised to the power A. As A is var- 
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ied from zero to one, this interpolates between sampling 
from the prior and the posterior distributions. Defining 



E(X) = J C\6) pr(0) 



d9, 



(3) 



it can be shown that 
E(l) 



\nE = In 



where 



E(0) 



(InC). 



d\nE 
dX 



dX 



(ln£) x dX, (4) 



_ J\nCC x pr(6>) dO 
= / C x pr(6») dO 



(5) 



is the average of ln£ over the distribution at tempera- 
ture 1/A. That the priors in Eq. must be normalized 
implies that E(0) equals one, though the prior normal- 
ization anyway cancels out in the integrand Eq. J^J). 

Previous work in cosmology has typically evaluated the 
evidence during the burn-in phase of a chain to be used 
for parameter estimation. In this process, the tempera- 
ture is slowly cooled from A = to A = 1 to facilitate 
the relaxation of the chain into its stationary distribution 
and those chain elements are used for evidence computa- 
tion; they are then discarded and the remaining elements, 
all sampled at A = 1, are used for parameter estimation. 
This method is ideal for complex inference problems with 
dimensionality d 3> 1 and multimodal likelihood distribu- 
tions, where a slow burn-in phase is necessary to explore 
the posterior in an unbiased manner and thus the evi- 
dence calculation comes 'for free'. However, in a typical 
cosmological problem the likelihood surface is consider- 
ably simpler, arguably unimodal, and the number of sam- 
ples required for a reliable burn-in is much smaller than 
the number of samples needed for an accurate evidence 
estimation. Therefore, we choose a different approach 
in which we heat the chain, using the endpoint of a pa- 
rameter estimation run as the starting point. Since the 
volume of parameter space is larger at higher tempera- 
tures it should be much easier to ensure that the chain 
is stationary at each temperature step during heating 
rather than cooling. We implemented two different heat- 
ing schedules: 

• Continuous temperature change. We let the inverse 
sampling temperature change continuously at each 
step as 



X(n) = (l-O r 



(6) 



where n is the step number. The single sample 
taken at that temperature can be viewed as an un- 
biased (although noisy) estimate of (ln£)A. This 
continuous approach obviates the problem of decid- 
ing the number of steps per position, transferring 
it to the step size. When the algorithm decides to 
stop, the integral is closed to A = in the last step. 
The stopping criterion is that the closure of the in- 
tegral by the last step would change InE by less 



than a certain threshold, e s t op , even for the most 
extreme likelihood encountered. The choices of £ 
and e s top determine the accuracy and speed of the 
evidence calculator, and optimum values must be 
determined empirically. After trying various possi- 
bilities we settled for £ = 5xl0 -5 and e s t op = 0.001. 
We have tested that decreasing either £ or e s top fur- 
ther does not affect our results. 

• Stepwise temperature change. The integrand of 
Eq. (0} is first estimated at A = 1 and 0, then at 
intermediate temperatures given by 



— — , 

q n 



(7) 



(q is typically 1.5-2 and n an increasing inte- 
ger). The thermodynamic integral is calculated 
by the trapezoid rule after each additional point is 
added. The points are added until the integral con- 
verges to a user-specified stopping accuracy e s top- 
At each temperature the integral is calculated by 
making a short burn-in at that temperature (typi- 
cally 400 samples, since the chain must already be 
roughly burned in from the previous step) and then 
calculating (ln£)>, from a further number (typi- 
cally 1000) of accepted samples. This approach has 
the disadvantage that extra samples are needed for 
burn-in at each temperature and that there might 
be systematics associated with stepwise tempera- 
ture change. However, it is less sensitive to the 
quality of covariance matrix as a poorer covariancc 
matrix simply results in more samples being taken 
to get enough accepted samples (note that we can- 
not do the same for the continuous scheme without 
biasing the result, unless one is willing to burn- in 
at each 'continuous' temperature change step). 

Additionally, we modify the proposal function so that 
its width scales with A -1 (up to a certain width), which 
ensures that at high temperatures the chain is sampling 
randomly from the prior, rather than random-walking 
with the step-size corresponding to the A = 1 posterior. 

These two methods have been extensively tested to 
give results that are consistent and accurate to within a 
unit of InE for a single run. The final numbers for all 
models were calculated using the continuous temperature 
change method. Additionally we have performed a com- 
parison with an analytic approximation to the posterior 
and got results that are also consistent to better than one 
unit of \xiE in the adiabatic case, though slightly worse 
in the isocurvature case. 

In all cases we find that the number of samples required 
to accurately estimate the evidence and avoid systemat- 
ics associated with covariance matrices, proposal widths 
and similar is unexpectedly large; an order of magni- 
tude larger than what is required for a simple param- 
eter estimation. This makes the computation a chal- 
lenging task as it is limited by the speed of the likeli- 
hood evaluations which require generation of the model 
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Parameter 


Prior Range 


Model 




(0.018,0.032) 


AD-HZ,AD-n s ,ISO 




(0.04,0.16) 


AD-HZ,AD-n s ,ISO 


6 


(0.98,1.10) 


AD-HZ,AD-n a ,ISO 


T 


(0,0.5) 


AD-HZ,AD-n s ,ISO 


ln[10 10 ^ rad ] 


(2.6,4.2) 


AD-HZ,AD-n a ,ISO 


n s 


(0.8,1.2) 


AD-n s ,ISO 




(0,3) 


ISO 


^cor 


(-0.14,0.4) 


ISO 




(-1,1) 


ISO 


13 


(-1,1) 


ISO 



TABLE I: The parameters used in the models. The sound 
horizon 8 was used in place of the Hubble parameter. For 
the AD-HZ model n s was fixed to 1 and ni ao , S COI , a and /3 
were fixed to 0. In the AD-n s model, n s also varies. Every 
isocurvature model holds the same priors for the whole set of 
parameters. 



power spectra. This also suggests that the uncertainties 
on evidence values already found in the literature may 
be underestimated, though we note that the high quality 
of the WMAP data makes this task considerably more 
difficult than it was in the pre- WMAP era. Further in- 
vestigation into evidence estimation methods is clearly 
warranted and will be a focus of a forthcoming paper. 



III. EVIDENCE FOR ISOCURVATURE MODELS 

Our principal aim is to compare the evidence of isocur- 
vature models with purely adiabatic ones. We will follow 
the notation of Beltran et al. |10l. In general there are 
four types of isocurvature modes — cold dark matter 
isocurvature (CDI), baryon isocurvature (BI), neutrino 
isocurvature density (NID), and neutrino isocurvature 
velocity (NIV) — but the first two are observationally 
indistinguishable [f| so we ignore the baryon isocurva- 
ture case. These modes can exist in any combination, 
and with correlations both amongst themselves and with 
the adiabatic modes. We will only allow a single type of 
isocurvature mode in any model, though we will allow a 
general spectral index both for the isocurvature modes 
and for their correlation with the adiabatic ones. 

The flat prior ranges for all parameters are given in 
Table [I] We consider two adiabatic models. AD-HZ is 
the simplest model giving a good fit to the data, with 
a Harrison-ZePdovich spectrum and five variable param- 
eters. We also computed the evidence for an extended 
adiabatic model AD-n s in which we let n s vary. 

For each isocurvature model there are four extra pa- 
rameters. As in Ref. [ToT ] we parametrize the contribu- 
tion to the temperature and polarization angular power 
spectra from the adiabatic, isocurvature and correlation 
amplitudes at the pivot scale (k = 0.05 Mpc -1 ) by a 
and (3 so that: 

Ci = ( 1 - a) Cf + a Cf° + 2(3 ^a(l - a) Cf OT . (8) 



Model 


ln(Evidence) 


AD-HZ 


0.0 ± 0.1 


AD-ris 


0.0 ± 0.1 


CDI 


-1.0 ± 0.2 


NID 


-1.0 ± 0.2 


NIV 


-1.0 ± 0.3 



TABLE II: Evidences for the four different models studied, 
normalized to the AD-HZ evidence. The absolute value for 
that model was InE = —854.1. 

The parameter <5 cor is related to the spectral tilt of the 
correlation mode, n cor , and its boundaries are fixed by 
the pivot scale and the fe m i n = 4 x 10~ 5 Mpc^ 1 and 
tax = 0.5 Mpc -1 scales used for the analysis. It is 
defined as 

^r^ncor/ln^- 1 . (9) 

Thus the priors on the first seven parameters are theoret- 
ically motivated, whereas the priors on the last three are 
automatically set by the model. Throughout the analysis 
the equation of state parameter of the dark energy was 
set to —1. 

We have used the following datascts: cosmic mi- 
crowave anisotropy data from the WMAP satellite in- 
cluding temperature-polarization cross-correlation Q] , 
VSA CBI [13 and ACBAR 0, matter power spec- 
trum data from the two-degree field galaxy redshift sur- 
vey (2dFGRS) power spectrum p3T | and from the Sloan 
Digital Sky Survey |24|. and the supernovae apparent 
magnitude-redshift relation [25j . 

We ran 32 independent computations of the evidence 
for each model. In all of them the stopping criterion was 
satisfied after about 2.5 x 10 5 steps, so the total number of 
likelihood evaluations was approximately 10 7 per model. 
The results, given as the logarithm of the evidence, are 
described in Table HH We have expressed all the calcu- 
lated evidence values relative to the AD-HZ model, as 
the absolute value is just a particular of the likelihood 
code. We see from the table that the evidences are cal- 
culated to sufficient accuracy to draw conclusions, but 
that the comparison is rather inconclusive. Firstly, the 
two adiabatic models happen to produce the same evi- 
dence; as a further consistency check, we also looked at 
an adiabatic model with the prior range on n s doubled, 
and found that InE fell by 0.4, to be compared with the 
expected drop of In 2 that would appear if the likelihood 
were insignificant throughout the extended range. Sec- 
ondly, by coincidence all three isocurvature models have 
the same evidence, with Aln.E being 1.0 relative to AD- 
HZ in each case. According to the Jeffreys' scale this is 
just at the edge of being worthy of attention. 

As mentioned in Section II, these results are not 
reparametrization invariant, since changing the basis of 
parameters typically leads to a different choice of priors. 
Various parametcrizations have been used in the litera- 
ture. For instance, a change of pivot scale leads to an 
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Model 


ln(Evidence) 


AD-HZ 


0.0 ± 0.1 


CDI 


-1.0 ± 0.2 


NID 


-2.0 ± 0.2 


NIV 


-2.3 ± 0.2 



TABLE III: Evidences for the four models using the second 
parametrization, again normalized to the AD-HZ evidence. 

(n s — ?T-i so ) -dependent rescaling of a, and to an n cov ~ 
dependent rescaling of (3. Even if the pivot scale is fixed, 
various definitions of the amplitude parameters can be 
introduced. The normalization of the isocurvature mode 
can be parametrized by the ratio of isocurvature to adi- 
abatic primordial fluctuations /j SO £ [0, oo] Q instead 
of the fraction of isocurvature contribution to the total 
primordial spectrum a £ [0,1] ,7\- In this work, as in 
Ref. 01 j we chose to vary ^/a £ [—1, 1] in order to avoid 
dealing with boundary effects and to have a posterior 
distribution falling down to zero on the two ends of the 
prior range. We could nevertheless instead have chosen 
a flat prior for a. Similarly, the cross-correlation ampli- 
tude can be parametrized either by the correlation angle 
(3 £ [—1, 1], as in Refs. or by the amplitude of the 

cross-correlation power spectrum 2(3\f a(l — a) 0. The 
advantage of the latter is that the total power spectrum 
depends linearly on it, and so it is well constrained by 
the data, while starting from a flat prior on (3 we can 
get a flat posterior distribution if the preferred model is 
purely adiabatic, so that the value of (3 does not mat- 
ter (this point is discussed in detail in Ref. ^lj where 
a third choice is also introduced). Finally, we defined 
the parameter 5 COT in order to deal with a simple top-hat 
prior, but we could decide to use instead to impose a flat 
/?-dependent prior directly on n cor . 

To get a hint of the effect of reparametrization, we re- 
computed the evidences using a second parameter basis: 
instead of {\fa, (3) we vary (a, 2(3^J a(l — a)) with a flat 
prior inside the two-dimensional ellipse in which these 
parameters are defined, and instead of <5 cor we vary n cor 
within the range [-0.141n(|/3|- 1 ),0.41n(|/3|- 1 )]. Since 
the prior on n cor is too loose when f3 is close to zero, 
we imposed the additional prior over n COI £ [—1,1]. 

The results are quoted in Table IIIII and show differ- 
ences from the ones that use the original parametriza- 
tion. Even though the difference is still not big enough 
to exclude any isocurvature model, we conclude that, as 
mentioned in Section II, parametrization does matter for 
the evidence calculation. 



IV. CONCLUSIONS 

We have carefully calculated the evidence for two 
adiabatic models and three physically-distinguishable 
isocurvature models using recent cosmic microwave back- 
ground, supernovae and large-scale structure data. We 
find very similar evidences for all the models. For the first 
parametrization used, the odds of the isocurvature mod- 
els compared to the adiabatic ones are 1 in about 4. Using 
a second parametrization of the isocurvature parameters 
we find the odds for the neutrino cases drop to 1 in 10. 
Therefore, we conclude that present data are unable to 
offer a clear verdict for or against the inclusion of isocur- 
vature degrees of freedom. This conclusion is similar to 
that found by Parkinson ct al. using the information 
criteria. Although the extra parameters introduce extra 
complexity, these models are still able to satisfactorily 
fit the present data for a wide range of their parameters 
and thus the evidence quantifies the common sense that 
one should allow these models to be considered. We also 
showed the relevance of the parametrization for evidence 
computation. 

While the present comparison is inconclusive, a key 
question for future data will be to select between the 
adiabatic and isocurvature paradigms. Parameter es- 
timation analyses cannot do this, as even if the adia- 
batic model is correct they can only impose limits on the 
isocurvature parameters. The Bayesian model selection 
approach we have described is the ideal tool to carry out 
such a selection. 
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